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PREFACE 


The  research  described  in  this  report,  "Markov  Models  Bor  Multiple  Bus  Mul¬ 
tiprocessor  Systems , "  UCLA-ENG-8203 ,  by  Marco  Ajmcne  Marsan  and  Mario  Gerla,  was 
carried  out  as  part  of  the  Research  in  Distributed  Processing,  sponsored  by  the 
Office  of  Naval  Research,  Contract  No.  N00014-79-C-0866  under  the  direction  of  A. 
Avizienis,  Principal  Investigator,  B.  Bussell,  M.  Ercegovac,  M.  Gerla,  S.  Parker  and 
D.  Rennels,  Co-Principal  Investigators,  in  the  Computer  Science  Department,  School 
of  Ehgineering  and  Applied  Science,  Uhiversity  of  California,  Los  Angeles. 
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ABSTRACT  -  Markovian  models  are  developed  for  the  performance  analysis  of  mul¬ 
tiprocessor  systans  intercommunicating  via  a  set  of  busses.  The  performance 
index  is  the  average  number  of  active  processors,  called  processing  power. 
From  processing  power  a  variety  of  other  performance  measures  can  be  derived 
as  dictated  by  the  specific  processor  application.  Exact  models  are  first  in¬ 
troduced,  and  are  illustrated  with  a  simple  example.  The  computational  com¬ 
plexity  of  the  exact  models  is  shown  to  increase  very  rapidly  with  system 
size,  thus  making  the  exact  analysis  impractical  even  for  medium  size  systems. 
To  overcome  the  complexity  of  computation,  several  approximate  models  are  in¬ 
troduced.  The  approximate  results  are  compared  with  the  exact  ones  and  found 
tc  be  surprisingly  accurate  for  a  wide  range  of  configurations.  Simulation  is 
used  to  validate  the  analytic  models  and  to  test  their  robustness . 
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1.  IOTRODUCTION 


Tightly  connected  multiprocessor  systems  are  characterized  by  the  pres¬ 
ence  of  several  processing  units  and  one  or  more  common  msnory  areas,  used  by 
the  processors  for  the  exchange  of  information  and,  possibly,  the  storage  of 
oarmcn  code  and  data  structures  of  non  frequent  use .  Processors  and  common 
manaries  are  connected  by  some  kind  of  canrmnication  system,  usually  called 
interconnection  network. 

Early  multiprocessor  systems  were  developed  using  crossbar  networks  to 
connect  processors  and  memories .  A  widely  known  crossbar  multiprocessor  sys¬ 
tem  is  C.ntmp,  the  Carnegie  Mellon  multiminicanputer  [WULF72] .  the  performance 
of  crossbar  multiprocessors  has  been  widely  analyzed  in  recent  years  [BHAN75, 
BASK76,  BCOG77,  SETTH77,  WELL78]. 

With  the  availability  of  inexpensive  microprocessors,  multiprocessor 
systems  with  a  very  large  number  of  components  are  now  beoaning  feasible  and 
cost  effective,  Ebr  such  systems  a  crossbar  interconnection  network  may  be 
intolerably  expensive  and  in  general  it  would  provide  a  bandwidth  much  higher 
than  needed.  A  more  attractive  alternative  is  represented  by  bus-oriented  in¬ 
terconnection  networks.  Single  or  multiple  bus  architectures  can  be  used,  ac¬ 
cording  to  the  bandwidth  required  for  the  specific  application.  These  inter¬ 
connection  networks  are  generally  called  "multiple-bus"  or  "highway  deficient" 
[WILL78]  networks.  Sane  papers  addressing  the  analysis  of  bus  systems  ap¬ 
peared  very  recently  in  the  literature  [HOEU77,  FUNG78,  WILL78]. 
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This  report  presents  exact  and  approximate  Nfertovian  models  for  the 
analysis  of  multiple-bus  multiprocessor  systems.  Section  2  describes  the 
basic  multiprocessor  system  investigated  in  this  study.  In  section  3  the 
model  for  performance  analysis  is  presented  and  the  assumptions  on  system 
operations  are  discussed.  Section  4  derives  a  variety  of  application-oriented 
performance  indices  .  Section  5  provides  an  exact  model  for  a  simple  crossbar 
architecture.  Section  6  discusses  exact  models  for  general  multibus  architec¬ 
tures,  Whereas  section  7  derives  seme  approximate,  but  computational ly  very 
efficient  models.  In  section  8  stochastic  Petri  net  models  are  introduced. 
In  section  9  exact  and  approximate  analytic  results  are  compared,  and  simula¬ 
tion  results  are  presented. 


2.  THE  KJLTIPLE  PROCESSOR  SYSTEM 


This  study  considers  multiple  processor  systems  that  exchange  informa¬ 
tion  through  a  common  memory  which  consists  of  several  modules.  Processors 
aixl  ocntnon  memory  modules  are  connected  by  a  set  of  "global  busses"  .  Each 
global  bus  can  connect  any  processor  to  any  memory  module.  Every  processor  is 
also  connected  (and  has  exclusive  access)  to  a  private  memory.  The  block  di¬ 
agram  of  a  system  with  3  processors,  3  memory  modules  and  2  busses  is  shown  in 
fig.  1. 

The  exchange  of  information  is  accomplished  by  first  writing  the  infor¬ 
mation  in  the  appropriate  common  memory  module  and  then  reading  it  from  the 
destination  processor.  Due  to  the  sharing  of  both  memory  modules  and  busses, 
contention  may  arise,  causing  processors  to  queue  for  a  resource  which  is 
currently  in  use .  If  the  number  of  busses  b  is  greater  or  equal  to  the  small¬ 
er  between  the  number  of  processors  p  and  the  number  of  memories  m,  i.e. 
b  min(m,p),  then  the  contention  is  only  caused  by  the  sharing  of  memory 
modules.  Therefore,  a  processor  can  always  find  a  free  bus  to  access  a  free 
common  memory.  If,  on  the  other  hand,  the  inequality  is  not  satisfied,  a  pro¬ 
cessor  may  be  forced  to  wait  for  a  memory  which  is  currently  free  because  no 
bus  is  available. 

NUltiple  processor  systems  for  which  the  inequality  holds  are  usually 
known  as  "crossbar"  architectures.  Nbte  that  in  general  it  is  not  wise  to  set 
b  >  min(m,p)  unless  we  want  to  add  some  redundancy  in  the  interconnection  net¬ 
work  for  reliability  purposes.  In  fact,  the  availability  of  extra  busses  does 
not  affect  the  crossbar  system  model,  nor  does  it  improve  its  throughput . 
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Multiple  processor  systems  for  '*hich  the  first  inequality  does  not  hold 
are  usually  called  "highway  deficient"  systems  or  "(multiple)  bus"  architec¬ 
tures  (where  the  ward  multiple  is  dropped  in  the  case  of  b=l ) .  Fbr  these  sys¬ 
tems  we  assune  throughout  this  report  that  p  >_  m  >  b.  The  case  m  >  p  can  be 
analyzed  using  the  same  techniques  described  here;  the  models  are  generally 
simpler  than  those  presented  in  this  report. 

It  is  possible  to  construct  a  queueing  network  model  for  the  analysis  of 
both  types  of  systems.  The  general  case  is  shown  in  fig.  2.  Processors  join 
memory  queues,  and  before  proceeding  to  service  (i.e.  accessing  menory)  they 
must  be  granted  a  permit  (bus) .  The  permit  is  returned  upon  completion  of 
service.  The  general  model  is  thus  a  closed  queueing  network  with  p  classes 
of  customers  and  with  passive  resources  [CHAN78,  KELL76b],  which  in  this  case 
represent  the  busses.  In  the  case  of  crossbar  architectures  the  presence  of 
busses  can  be  ignored,  thus  making  the  analysis  substantially  simpler  than  for 
multiple  bus  systems. 


-5- 


-6- 


3.  THE  MDDEL 


Models  of  multiprocessor  systems  are  developed  both  to  gain  a  deeper 
understanding  of  their  behavior  and  to  obtain  a  set  of  performance  ini  ices 
that  can  be  used  to  guide  the  design  of  actual  systems. 

A  model  cannot  include  all  the  details  of  the  system,  rather,  it  is  an 
abstraction  of  the  real  system  including  the  features  relevant  to  the 
analysis.  Different  models  are  generally  constructed,  depending  on  the  nature 
of  the  application  and  the  degree  of  detail  required  by  the  study.  In  our 
case,  the  centred,  feature  of  the  system  is  the  overall  processing  capability 
limitation  due  to  the  contention  for  manor ies  and  busses.  Our  models  there¬ 
fore  will  focus  on  the  loss  of  processing  power  due  to  this  contention. 

In  general  ws  say  that  a  processor  can  be  in  one  of  three  different 
states: 

(1)  The  processor  can  execute  in  its  private  memory. 

(2)  The  processor  can  exchange  data  with  other  cooperating  processors, 
by  reading  fran,  or  writing  into  the  common  manory  modules. 

(3)  The  processor  can  be  waiting  to  access  a  cannon  menory  module. 

We  say  that  a  processor  is  ACTIVE  when  it  is  in  state  ( 1 ) ,  and  the  goal 
of  our  analysis  is  to  determine  the  average  percentage  of  time  for  which  pro¬ 
cessors  are  active. 
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By  introducing  an  ergodic  assumption  we  can  say  that  the  above  quantity 
is  equal  to  the  average  number  of  active  processors  divided  by  the  total 
number  of  processors.  Such  quantity  is  usually  known  in  the  literature  as 
Processing  Efficiency.  As  the  number  of  processors  is  a  known  constant  we  can 
simply  evaluate  the  average  number  of  active  processors,  called  Processing 
Power  of  the  system  (P). 

P  =  E  [#  active  processors]  (1) 

P  is  the  main  performance  index  considered  in  the  sequel.  Other  impor¬ 
tant  performance  measures  are  simply  related  to  P,  as  shown  in  section  4. 

The  following  assumptions  are  made  regarding  the  operation  of  the  sys¬ 
tem: 


a)  Processors  perform  a  background  activity  that  only  requires  accesses 
to  the  processor' s  private  menory. 

b)  Fran  time  to  time  processors  exchange  information,  and  thus  access 
the  carman  menory,  performing  read/write  operations. 

1  c)  The  duration  of  the  access  to  the  camon  memory  is  an  independent, 
exponentially  distributed  randan  variable  with  mean  l/;ur  for  the  j-th  memory 
module. 

d)  When  a  processor  requires  access  to  a  common  menory  module,  a  path 
is  irrmediately  established  (with  zero  delay)  between  the  processor  and  the 
referenced  menory  module,  pro/ided  that  a  bus  is  available  and  the  menory  is 
not  being  accessed  by  another  processor. 


e)  If  a  path  cannot  be  established  the  processor  idles,  waiting  for  the 
necessary  resource ( s)  (This  may  not  be  true  for  multiprocessor  systems  using 
an  interrupt  mechanism.  The  hypothesis  is  conservative  anyway). 

f)  Upon  memory  access  completion,  memory  and  bus  are  immediately 
released  (with  zero  delay)  and  the  processor  resumes  its  background  activity. 
The  interval  between  subsequent  access  requests,  is  an  independent,  exponen¬ 
tially  distributed,  random  variable  with  mean  1  /  X j  for  the  j-th  processor. 

g)  An  access  request  from  processor  i  is  directed  to  memory  j  with  pro¬ 
bability  p„  .  Thus,  the  access  rate  £rcm  processor  i  to  memory  j  is  defined 
as  \  .  .=.\.p.  .  . 

The  above  assumptions  guarantee  that  a  Nbrkovian  model  can  be  construct¬ 
ed.  Unfortunately,  this  does  not  guarantee  that  a  solution  (closed  form  or 
numerical)  can  then  be  easily  obtained .  In  particular,  such  models  show  an 
explosion  of  the  number  of  states  when  the  number  of  system  components  is  in¬ 
creased.  The  analysis  becomes  rapidly  very  tedious  even  for  moderately  com¬ 
plex  systems. 

In  order  to  reduce  the  number  of  states  we  introduce  three  further  as¬ 
sumptions  . 

h)  All  processors  are  assumed  to  have  equal  common  memory  access  rate, 
X,  and  all  memories  are  assumed  to  be  equal,  so  that  the  average  memory  access 
time  is  the  same  for  all  memories  and  all  processors  (l/ju) . 
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i)  A  uniform  reference  model  is  assumed ;  this  implies  that  every  access 
request  frcm  every  processor  is  directed  to  any  matvory  with  equal  probability 
i/m,  inhere  m  is  the  number  of  carrion  memory  modules. 

1)  When  a  bus  goes  idle,  the  next  processor  to  use  the  bus  is  selected 
at  random  among  the  heads  of  the  queues  referencing  memories  Which  have  became 
free. 

In  formulae,  assumptions  h)  and  i)  state  that: 


P 


ij 


1 

m 


all  i, j 


all  i, j 


(2) 


With  these  additional  assumptions  we  succeed  in  performing  an  exact 
analysis  of  some  moderately  complex  systems,  but  still  cannot  attack  very 
large  problems. 


The  equal  processor  access  rate  assumption  in  h)  was  shown  to  be  a  con¬ 
servative  one  in  the  single  bus  case  [AJM080]  and  is  expected  so  also  in  the 
more  general  case  of  multiple  busses . 

Processors  access  the  common  memory  modules  to  perform  either  read  or 
write  operations;  we  do  not  distinguish  between  the  two  operations  in  our 
models,  and  do  not  therefore  account  for  the  fact  that  a  processor  may  attempt 
to  read  data  vhich  is  not  present  in  common  memory.  This  results  in  the  pro¬ 
cessor  going  idle,  with  consequent  throughput  reduction.  This  feature  can  be 
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included  in  the  martovian  model,  but  the  state  space  is  greatly  expanded.  A 
more  system  oriented  approach  can  be  pursued,  by  assuning  that  a  fraction  d  of 
the  accesses  is  for  write  operations  and  a  fraction  (1-cO  is  for  read  opera¬ 
tions.  A  read  operation  finds  the  required  information  with  probability  q. 
Assuning  that  the  access  request  generation  process  is  not  altered  by  not 

finding  the  desired  information,  the  actml  time  spent  in  useful  computation 

» 

is  decreased  by  a  fee  tor  (1-q)  (1-d)  .  Thus,  the  actual  processing  power  of  the 
system  is  simply  obtained  by  applying  the  above  factor  to  the  computed  value 
of  P.  Obviously  in  this  case  it  is  necessary  to  estimate  the  values  of  q  and 
d ,  which  depend  on  many  system  parameters. 

Using  all  the  above  assumptions  we  can  now  construct  a  tfertov  chain  to 
model  the  behavior  of  the  system. 

The  state  of  the  Martov  chain  is  defined  by  the  2p- tuple 

*rnl'  8i»  s2' ' ' '  ,rap'  sp)  (3) 

where: 

m^  is  the  memory  referenced  by  processor  i 

s^  is  the  state  of  processor  i 
mu  can  take  values: 

0:  processor’ s  private  memory 
k:  k-th  cannon  memory  module 
si  can  take  values 

0:  active 

j:  queueing  ( j-th  in  queue)  for  module  m^ 

-1:  accessing  cannon  memory  module  nr 


This  state  definition  however  is  not  the  most  convenient  frcm  the  canpu- 
tational  point  of  view.  In  fact,  using  the  theory  of  "Lunpable"  Nfertov  chains 
[KEME60],  we  may  lunp  equivalent  states  and  obtain  a  t-tertov  chain  of  substan¬ 
tially  smaller  size.  The  1  unping  technique  is  illustrated  by  an  example  in 
section  5. 

The  state  definition  and  the  degree  of  lunpability  of  the  chain  depend 
on  the  policy  that  is  used  to  assign  a  free  bus  to  a  queueing  processor.  As¬ 
sumption  1)  is  the  most  convenient  from  the  model  complexity  point  of  view, 
but  might  not  be  the  one  that  yields  the  best  performance.  Modifications  of 
assumption  1)  will  be  briefly  discussed  in  the  sequel. 
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4.  PERFORI^NCE  MEASURES 


The  processing  power  is  not  the  most  appropriate  performance  index  for 
same  applications.  Other  parameters  could  better  describe  the  quality  of  the 
system  in  same  cases.  fortunately,  however,  many  different  performance  in¬ 
dices  can  be  simply  derived  frcm  the  processing  power. 

.  * 

Define  ,\  to  be  the  rate  at  which  customers  cycle  through  the  queueing 
network.  Fran  Little’s  result  we  have: 

=  P  ,\  (4) 


Applying  again  Little' s  result  to  the  entire  memory  system  including 
queues  and  servers  we  find  the  average  customer  delay  D: 


(5) 


Finally,  subtracting  from  D  the  average  service  time  l/u  we  have  the  average 
queueing  time  W: 


where  p  = 


W  =  P  -  |  .^14P) 


(6) 


The  average  nunber  of  queued  processors  is: 


Nq  =  w  P  ,\  =  p  -  P(l+p) 


(7) 


therefore  the  average  number  of  processors  accessing  common  memory  modules  is: 
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(8) 


Fran  the  values  of  average  cycle  time,  average  queueing  time,  average 
think  time  and  average  service  time  we  can  new  construct  many  different  per¬ 
formance  indices,  depending  on  the  particular  application. 


If  the  processors  are  simply  updating  a  data  base,  a  reasonable  perfor¬ 
mance  measure  could  be  the  ratio  of  the  memory  access  time  to  the  sun  of  the 
access  time  plus  the  waiting  time.  Using  the  above  results,  this  performance 
index  is  expressed  as  follows : 


If,  on  the  other  hand,  our  multiprocessor  system  is  a  packet  switch 
operating  under  heavy  load  conditions,  where  input  processors  process  packets 
and  write  them  into  a  cannon  memory  and  output  processors  read  them  and  again 
process  than  before  queueing  them  for  output,  then  the  "think”  time  represents 
the  time  necessary  to  process  an  incoming  (outgoing)  packet  and  the  service 
time  represents  the  time  necessary  to  write  (read)  a  packet  from  an  input 
(output)  processor  (note  that  the  exponential  read/ write  time  corresponds  to 
exponential  packet  length  distribution) .  The  throughput  of  the  packet  swatch 
can  then  be  expressed  as: 
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zps  =  ^  =  ^ 


(11) 


Note  that  each  packet  must  be  processed  by  an  input  awl  an  output  processor; 
both  operations  require  one  cycle  time,  and  p  packets  can  be  processed  simul¬ 
taneously. 

Performance  indices  for  other  applications  can  be  constructed  in  a  simi¬ 
lar  way. 


5.  CROSSBAR  ARCHITECTURES 


We  begin  by  presenting  as  an  example  the  simplest  non- trivial  case,  a 
2-processor,  2-memory,  2-bus  (2x2x2)  system.  (Note:  the  even  simpler  case  of 
a  single  bus  structure  is  trivial,  and  can  be  analyzed  using  an  M/tyl  queue 
with  finite  population.  Extensions  of  the  single  bus  system  to  different  pro¬ 
cessor  access  rates  and  general  service  distributions  cure  found  in  [AJM080]) . 

A  px2x2  system  is  a  crossbar  multiprocessor  and  can  thus  be  studied  as  a 
closed  queueing  network  with  p  classes  of  customers .  Due  to  the  assumptions 
introduced  the  solution  can  be  obtained  by  application  of  the  product  form 
solution  [BASK75].  We  shall 1  nevertheless  construct  a  tfertov  chain  model,  as 
explained  before,  to  provide  a  first  simple  example. 

The  state  definition  is  in  the  case  of  a  2x2x2  system 


(n^,  s1#  rt^,  s2)  (12) 

and  the  Nfarkov  chain  that  we  obtain  using  assumptions  a)  through  g)  is  shown 
in  fig.  3a.  In  this  case  no  lumping  is  possible.  However,  if  we  add  assump¬ 
tions  h)  through  1)  the  transition  rates  are  modified  as  shown  in  fig.  3b. 

We  can  apply  the  lumping  technique  to  this  Nfertov  chain  by  defining  mac¬ 
rostates  as  follows: 

(00)  =  [(0000)] 

(-10)  =  [(1-100), (2-100), (001-1), (002-1)] 

(-11)  =  [(1-111), (2-121), (111-1), (212-1)] 

(-1-1)  =  [(1-12-1), (2-11-1)]  (13) 

The  lumped  chain  is  shown  in  fig.  4. 
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Steady  state  probabilities  for  the  chain  in  fig.  4  are  now  very  easily 


evaluated,  yielding: 

P(-10)  =  2 p  P(00) 

P(-ll)  =  p2  P(00) 

P(-l-l)  =  ^  p2  P(00)  (14) 

P(00)  =  Cl  +  2p  +  |  p2]-1 


The  processing  power  P,  defined  as  the  average  number  of  active  processors  is 
obtained  as: 

P  -  2  P(00)  +  P(-10)  =  2 (l+p)  [1  +  2p  +  |  p2]"1  (15) 

As  soon  as  ve  increase  by  one  the  number  of  processors  we  realize  that 
the  general  description  is  not  practical.  We  have  49  states  in  this  case, 
that  we  can  lump  to  6  macrostates  as  shown  in  fig.  5. 

The  processing  power  is  now  obtained  as: 

P  =  3  P(000)  +  2  P(-100)  +  P(-101)  +  P(-l-10)  (16) 


In  the  same  manner  we  get  the  lumped  chain  in  the  case  of  four  proces¬ 
sors  that  is  shown  in  fig.  6. 

In  this  case  we  see  that  in  the  limped  chain  we  have  two  states  with  two 
processors  accessing  the  cannon  memory  and  two  processors  in  queue.  State  a 
is  such  that  both  processors  queue  for  the  same  memory  and  state  b  is  such 
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Fig.  6  -  Lumped  Nfertov  chain  for  the  4x2x2  system. 
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In  the  case  of  a  px2x2  system  we  are  not  interested  in  the  policy  followed  to 
choose  the  next  processor  to  be  served  when  a  bus  becanes  available:  the  only 
thing  that  can  be  done  is  to  pick  one  of  the  processsors  queueing  for  the 
manary  that  has  became  available  (This  is  true  in  general  for  any  crossbar  ar- 
chitecuure) .  The  fact  that  we  choose  the  first  in  the  queue  is  irrelevant  for 
the  evaluation  of  the  processing  power. 

We  can  row  draw  the  lunped  chain  in  the  general  case  of  a  px2x2  system 
(Fig.  7).  The  number  of  states  of  the  lumped  chain,  N,  can  be  evaluated  as: 


The  number  of  active  processors  associated  to  each  state  is: 
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p  -  n  -  n  -  n 

m  qi  q2 


and  thus  the  evaluation  of  the  processing  power  is  straighforward ,  once  the 
steady  state  probabilities  associated  to  the  states  of  the  Martov  chain  are 
evaluated. 


In  the  case  of  a  3  memory,  3  bus  system  the  state  of  the  limped  Nfertov 
chain  is  defined  as: 


(n  ,  n  ,  n  ,n  )  ,  n  >  n  >  n 

m  ql  q2  q3  ql  ~  q2~  q3 


where 


n  ,  n  ,  n  are  defined  as  before 
m'  qx  q2 

n  is  the  number  of  processors  queueing  for  the  third  common  memory 
q3 

currently  accessed. 

The  limped  chain  that  we  obtain  is  now  shown  in  fig.  8.  The  transition 
rates  between  the  states  are  not  shown,  but  can  be  easily  evaluated. 

In  the  general  case  of  p  processors,  m  memories  and  m  busses  (p  >_  m)  the 
state  of  the  limped  chain  is  defined  by  the  (mtl )- tuple 


(n  n  ,...,n  )  ,  n  >  n  >...>n 


m,  q 


ql“  q2~  -  % 


and  the  definition  of  the  entries  is  a  straightforward  extension  of  the  previ¬ 


ous  case. 


The  structure  of  the  Martov  chain  is  the  same  as  in  fig.  8  up  to  level 


3,  then  more  states  must  be  considered. 
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We  can  express  in  the  general  case  the  transition  rates  between  two 
states,  provided  that  we  specify  more  precisely  the  state  of  the  Nterkov  chain. 


Given  a  state  as  in  (21),  the  entries  n  must  be  arranged  in  decreasing 


order.  We  will  then  have  seme  groups  of  adjacent  entries  with  the  same  value. 


All  the  entries  of  the  state  can  at  most  increase  or  decrease  by  one 
unit  at  a  time.  Only  one  entry  can  change  at  a  time. 


Given  a  group  of  entries  n  ,  n  , ....  n  ,  all  with  the  same  value, 

%  Sk+l 

only  the  first  entry  of  the  group  can  increase  by  one  unit,  and  only  the  last 
entry  can  decrease  by  one  unit.  In  this  manner  we  are  sure  to  preserve  the 
entries  in  decreasing  order. 


Consider  now  a  state 


(i,  q^»  o^)  (22) 

this  state  can  evolve  into  at  most  2(mtl )  other  states,  which  are  identified 
by  the  following  transitions: 

i  — >  i+1 

i  — >  i-1  i>0 

(23) 

q^  — >  q^+1  first  entry  of  a  group 

q^  — >  qj-1  qj  last  entry  of  a  group,  qj>0 

The  rates  associated  to  each  of  these  transitions  are: 
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R(i  ->  i+1 )  = 


P  #\ 


i=0 


|  (p-n)(m-i)  ^  0<i<m  ,  n<p 

I 

R^qk  _>  =  1  i<m'  n<P< 

R(i  ->  i-1)  =  (i-s)  p  i-l>_s 

R(qj  ->  qj-1)  =  1  ji  j^i 

where: 


(24a) 

(24b) 

(24c) 

(24d) 


n  =  i  +  2  q^. 

k=l  ^ 

1  =  #  of  entries  q1»q2»  that  have  the  same 

value  of  qj  (including  qj  itself)  (25) 

s  =  #  of  nonzero  entries  q^,q2,  •••»q^ 

The  number  of  active  processors,  associated  to  each  state  is  simply  p-n; 
it  is  thus  very  easy  to  obtain  the  processing  power,  once  we  have  solved  for 
the  steady  state  probability  distribution  of  the  Nbrtov  chain. 
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6.  MJLTIBUS  ARCHITECTURES:  EXACT  NODELS 


Ft)r  multiple  bus  architectures,  the  complexity  of  the  Nferkov  chains  is 
much  larger  then  for  crossbar,  even  when  lumping  is  used.  Therefore  we  can 
handle  only  moderately  canplex  systems  using  the  exact  state  description.  Fbr 
the  most  general  case  we  must  resort  to  approximate  models. 

The  state  definition  for  the  exact  lunped  chain  in  the  case  of  a  multi¬ 
ple  bus  system  is: 


^rW  ^1'  q2  ' •••'  Sm^ 


(26) 


where 


is  the  number  of  processors  currently  accessing  a  cannon  memory 

q1#...,  are  the  numbers  of  processors  queueing  for  the  menories 
currently  accessed,  arranged  in  decreasing  order 

Sfc>H '  *  *  • '  31:6  the  timbers  of  processors  queueing  for  a  free  manory, 

not  accessible  because  no  bus  is  available,  arranged  in  decreasing  ord¬ 
er. 


Seme  examples  of  Imped  Markov  chains  are  given  in  figs.  9  through  13, 
for  3x3x2,  4x3x2,  5x3x2,  4x4x2  and  4x4x3  systems,  respectively. 

Note  that  an  increase  in  the  number  of  processors  and/or  memories  com¬ 
plicates  the  Markov  chain,  whereas  an  increase  in  the  number  of  busses  tends 
to  simplify  the  Nferkovian  representation.  This  is  due  to  the  fact  that  the 
presence  of  a  higher  number  of  buosses  makes  the  system  more  similar  to  a 
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crossbar,  and  thus  reduces  the  number  of  possible  queueing  situations. 

When  the  number  of  busses  is  just  one  less  than  the  number  of  proces¬ 
sors,  the  policy  for  the  choice  of  the  next  processor  to  be  served  is  ir¬ 
relevant.  In  the  other  cases  the  f-fartov  chain  depends  on  such  policy.  Con¬ 
sider  for  instance  a  4x3x2  system  where  the  next  processor  served  is  the  one 
that  has  been  waiting  longest.  In  this  case  the  fterhov  chain  is  the  one  shown 
in  fig.  14,  where  an  asterisk  is  added  to  indicate  which  c us  toner  has  priori¬ 
ty.  In  general,  modifications  of  assumption  1)  require  that  more  information 
about  the  state  of  the  system  queues  is  recorded  in  the  Markov  chain  state 
description.  Hie  resulting  chains  may  thus  be  much  more  canplex  than  those 
obtained  using  assumption  1) . 

The  general  pxmxb  case  is  not  easy  to  handle,  even  after  lumping  is  ap¬ 
plied.  We  will  therefore  introduce  in  the  next  section  sane  approximations 
which  further  reduce  the  size  of  the  Markov  chain  and  permit  us  to  attack  the 
most  general  case. 
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3 


Elg.  9  -  Limped  Martov  chain  for  the  3x3x2  system. 


Pig.  10  -  Limped  ffertov  chain  for  the  4x3x2  system. 
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Fig.  12  -  Limped  ftertov  chain  for  the  4x4x2  system. 
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using  a  PCFS  discipline. 


7.  MULTIBUS  ARCHITECTURES:  APPROXIMATE  NPDELS 

The  reason  for  the  introduction  of  approximate  Nferkovian  models  is  that, 
for  general  multibus  systems,  the  number  of  states  increases  very  rapidly  with 
system  size.  The  explosive  growth  is  due  to  the  detailed  information  that  the 
states  must  record  about  the  queues  inside  the  system.  In  particular  for  each 
state  of  the  Markov  chain  the  number  of  customers  queued  for  sill  common  memory 
modules  must  be  recorded.  That  is,  we  not  only  need  to  know  the  number  of  the 
queued  customers,  but  also  must  be  concerned  with  all  the  possible  ways  of 
distributing  these  customers  among  the  system  queues.  If  we  reduce  the  amount 
of  information  about  the  status  of  the  queues  we  have  no  longer  a  first  order 
bfarkov  chain  behavior  in  the  evolution  of  the  system  through  the  state  space. 
The  approximate  Iterkov  models  that  we  introduce  in  this  section  analyze  the 
system  behavior '  by  assuming  that  the  transitions  between  the  states  with  re¬ 
duced  queueing  information  still  satisfy  the  Markov  property.  The  results 
that  vre  will  obtain  in  this  way  are  approximate  and  must  then  be  compared  to 
the  exact  ones  to  test  their  accuracy. 

In  order  to  define  a  simplified  model,  one  needs  to  specify: 

a)  the  state  definition,  that  is  the  amount  of  information  used  to 
describe  the  state  of  the  hferkov  chain.  As  was  mentioned  before  we  will  use 
reduced  information  about  the  queues  in  the  system. 

b)  the  method  to  calculate  the  transition  rates  for  the  simplified  Mar¬ 
kov  model.  As  the  behavior  is  approximated  by  the  simplified  Iferkov  chain  the 
transition  rates  must  be  evaluated  according  to  same  empirical  rule,  and 
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severed  different  rules  can  be  envisioned. 

Three  different  state  definitions  (named  A,  B  and  C)  and  two  heuristic 
methods  for  the  evaluation  of  the  transition  rates  (named  1  and  2)  were  con¬ 
sidered.  The  approximate  models  are  named  using  the  letter  referring  to  the 
state  description  and  the  number  referring  to  the  evaluation  of  the  transition 
rates . 


Let  us  first  begin  with  a  very  simple  model: 


Model  A1  -  The  state  of  the  systsn  is  simply  represented  by  the  to¬ 
tal  number  of  processors  waiting  either  for  a  busy  memory  or  for  a 
busy  bus,  and  by  the  number  of  processors  currently  accessing  a  ccm- 
mcn  memory  module.  We  thus  have  a  pair 


where 

nm  =  #  of  processors  in  service 
n^  =  #  of  processors  queued 


(27) 


The  transition  rates  are  evaluated  by  assuming  that  each  active  pro¬ 
cessor  can  request  any  memory  module  with  the  same  probability  (uni¬ 
form  reference  model) .  Furthermore,  each  queued  processor  is  as¬ 
sumed  to  request,  with  uniform  probability,  any  of  the  common  monory 
modules  currently  not  accessible  (this  approximation  implies  that  a 
queued  processor  can  rand  only  reselect  a  new  manory  when  a  memory  or 
bus  becomes  unblocked) . 
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If  we  apply  this  approximation  to  the  2x2x2  system  and  to  the  3x2x2  sys¬ 
tem  we  find  again  the  exact  (lunped)  chains.  In  other  words,  the  above  as- 
sunptions  are  automatically  verified  in  such  snail  systems,  and  therefore  no 
approximation  is  introduced . 

Consider  now  a  4x2x2  system:  in  this  case  ws  have  two  states  in  which 
two  processors  are  queued.  Our  approximate  chain  wall  consider  these  two 
states  as  a  single  one.  Note,  however,  that  the  merging  violates  the  condi¬ 
tions  for  limping.  Sane  error  will,  therefore,  appear  in  the  results  due  to 
such  "prohibited"  limping.  The  chain  that  we  get  is  shown  in  fig.  15. 

This  approximation  can  be  extended  very  easily  to  the  px2x2  system,  and 
the  resulting  chain  is  shown  in  fig.  16.  The  number  of  states  N  is  in  this 
case  only  twice  the  number  of  processors. 

TO  illustrate  the  rate  computation,  consider  states  (2,p-2)  and  (l,p-2). 
The  rate  from  (2,p-2)  to  { 1 , p-2 )  is  evaluated  by  multiplying  the  rate  out  of 
state  ( 2 , p-2 ) ,  which  is  2P.  by  the  probability  that  none  of  the  p-2  queueing 
processors  is  referencing  the  memory  that  becomes  free.  Such  probability  is 
(l/2)^2. 

Carrying  out  the  analysis  for  the  most  general  case,  we  find  that  the 
pxmxb  system  is  represented  by  a  Martov  chain  with  b  vertical  chains  ( see  fig . 
17)  and  a  total  number  of  states  N,  where: 

N  =  l+  b[p+i  (1-b)]  (28) 
A  simple  upper  bound  onNisN<pb+l. 
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b  VERTICAL  CHAINS 


Fig.  17  -  Chain  of  the  pxmxb  system  with  the  approximate 


The  transition  rates  can  be  explicitly  written  for  the  most  general 
case.  Their  derivation  is  reported  in  appendix  1.  Since  the  number  of  active 
processors  is  p-i-j,  the  processing  power  can  be  simply  evaluated  once  the 
steady  state  distribution  of  the  Nferkov  chain  is  known. 

Next  we  introduce  a  modification  of  model  Al,  by  specifying  a  different 
method  for  the  calculation  of  the  transition  rates: 

Model  A2  -  The  state  of  the  system  is  defined  as  in  model  Al ) .  The 
transition  rates  are  evaluated  using  an  "averaging"  technique. 

We  describe  the  model  A2  using  a  3x3x2  system  as  an  example. 

The  exact  lumped  chain  for  the  3x3x2  system  is  shown  in  fig.  9.  Using 
our  approximation,  the  states  (2100)  and  (2001)  are  merged  into  state  (2,1), 
even  if  this  violates  the  lumping  conditions.  In  the  approximate  chain  all 
the  transition  rates  are  unchanged,  except  for  those  in  and  out  of  state 
(2,1).  Namely,  *-he  rates  into  state  (2,1)  are  obtained  by  adding  the  rates 
into  the  two  merged  states.  The  rates  out  of  state  (2,1)  are  obtained  by  not¬ 
ing  that  the  total  rate  out  must  be  2p,  and  that  the  rate  out  of  the  two 
merged  states  is  towards  state  (1,1)  and  u+2n  towards  state  (2,0).  We  thus 
average  the  rate  out  of  state  (2,1),  keeping  the  same  ratio.  The  resulting 
chain  is  shown  in  fig.  18. 

Note  that  no  error  is  made  in  the  approximation  if  the  merged  states 
have  equal  steady  state  probability.  Otherwise  the  resulting  chain  only  ap¬ 
proximates  the  exact  one. 
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The  px2x2  system,  using  this  approx imation ,  is  represented  by  the  chain 
of  fig.  19. 

The  mare  general  case  of  p  processors,  m- manor ies,  2  busses  can  still  be 
handled,  provided  that  we  solve  the  combinatorial  problem  of  counting  the 
number  of  states  at  each  level  of  the  exact  lumped  chain.  The  level  of  the 
state  is  defined  as  the  sun  of  the  number  of  processors  accessing  common 
manory  and  the  number  of  queued  processors.  There  is  only  one  state  at  levels 
0  and  1,  and  there  are  two  states  at  level  2.  Fbr  levels  larger  than  two  we 
have  one  state  with  nm=l  and  n{m,2,k)  states  with  11^=2.  The  expression  of 
n(m,  2,k)  is  derived  in  appendix  2.  The  approximate  chain  in  the  case  of  a 
pxmx2  system  is  shown  in  fig.  20. 

The  extension  to  the  general  pxmxb  system  with  an  arbitrary  number  of 
busses ,  requires  the  counting  of  the  states  at  each  level  of  a  more  complex 
Markov  chain,  and  the  corresponding  evaluation  of  new  transition  rates. 

I 

We  now  consider  another  definition  of  system  state  (yet  retaining  the 
rate  computation  rule  of  model  A2): 

Model  B2  -  The  state  of  the  system  is  represented  by  the  following 
triplet :  ( 1 )  the  number  of  processors  accessing  a  common  memory 
module;  (2)  the  total  number  of  processors  waiting  either  for  a  busy 
memory  or  for  a  busy  bus;  and  (3)  a  flag  which  is  set  to  zero  when 
no  processor  is  queued  for  a  bus,  and  is  set  to  one  when  one  or  more 
processors  are  queued  for  a  bus  in  order  to  access  a  free  camnroon 
memory  module. 
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Namely,  the  state  is  defined  by  the  triplet 


(V  V  f)  (29) 

where 

n^  =  #  of  processors  accessing  common  memory 
n^  =  #  of  queueing  processors 
f  flag:  0  no  queue  for  a  bus 

1  one  or  more  processors  are  queued  for  a  bus 

The  transition  rates  are  evaluated  using  the  averaging  technique 
described  in  the  approximation  A2. 

Clearly,  model  B2  is  a  refinement  of  A2,  since  the  state  is  improved  by 
adding  a  binary  information  concerning  the  system  queues. 

We  immediately  recognize  that  for  crossbar  architectures  the  B2  approxi¬ 
mation  is  the  same  as  the  A2  approximation,  since  the  flag  is  always  zero  (no 
wait  for  a  bus) . 

Consider  now  a  4x3x2  system:  the  approximate  chain  is  shown  in  fig.  21. 
If  we  compare  thi  ~  chain  with  the  exact  lumped  chain  of  fig.  10,  we  see  that 
four  states  have  been  merged  into  two,  violating  the  lunpability  conditions. 
The  new  transition  rates  are  computed  using  the  averaging  technique.  The  ap¬ 
proximate  ttertoov  chain  for  a  5x3x2  system  is  shown  in  fig.  22. 

The  general  pxmx2  case  can  be  managed  by  using  the  combinatorial  results 
of  appendix  2.  The  resulting  chain  is  shown  in  fig.  23.  The  total  number  of 
states  is  in  this  case 


-43- 


Fig.  22  -  Chain  of  the  5x3x2  system  with  the  approximate  model  B2. 
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23  -  Chain  of  tte  pxmx2  system  with  the  approximate  model  B2 


N  =  3 (p-1 )+l 


(30) 


As  an  example  the  px3x2  chain  is  showi  in  fig.  24.  In  this  particular 
case  the  combinatorial  results  can  be  put  in  polynomial  form  (see  appendix  2). 

The  number  of  active  processors  associated  to  each  state  is  p-nm-n  , 
thus  the  processing  power  can  be  easily  computed,  once  the  steady  state  pro¬ 
bability  distribution  of  the  chain  is  kncwi. 

All  the  preceding  approximate  models  lack  of  one  feature  vtfiich  is  very 
desirable  in  all  analytic  models:  namely,  a  closed  form  solution.  Vfe  intro¬ 
duce  here  the  simplest  possible  model,  vhich  provides  us  with  a  closed  form 
solution. 

Model  C2  -  The  system  state  is  simply  the  number  of  active  proces¬ 
sors:  no  account  is  kept  of  the  state  of  internal  queues.  The  tran¬ 
sition  rates  are  evaluated  using  the  averaging  technique. 


The  transition  diagram  in  the  case  of  a  pxmx2  system  is  shown  in  fig. 
25.  Vfe  have  reduced  the  system  description  to  a  birth  and  death  Nferkov  chain, 
vhose  solution  is  easily  obtained:  denote  by  n(i)  the  steady  state  probability 
of  state  i,  then 

"(i)  *  luiP’ifr  *£ 

(31) 


n(p) 


1  +  E?'i^j?rP’«2 

j=0lUI  31  kO 


with 


Chain  of  the  px3x2  systsrv  with  the  approximate  model  B2 


where  n(m, 2,  i)  is  defined  in  appendix  2.  The  processing  power  can  then  be  ex¬ 
pressed  as: 


p-i-  pl 

p  i;u|  ( i- 1 )  l  ^ 

i=1  1  +  P*  '!uilp"j?rP™2)k1i 

j= 0  llbl  te=0  k  I 


The  general  pxmxb  case  can  also  be  solved.  The  resulting  Nferkov  chain 
is  shown  in  fig.  26.  The  steady  state  probabilities  are  in  this  case: 


■(pi •  1 1  +  “s1 1  |tjp-3  a  ^  p^1! ! 

I  j=0  I  lMl  Tc=l  ^  I  I 


vbere: 


b-1  i-b 

1  j  p^(i)  +  b  I  jtb)  pm_b(i-2b-jtm) 

B  =  7=1  JO  1 _ 

pi  b-1  i-b 

^  PjU)  +  |  Pb(^b)  Pm-b(i-2b*^m)  | 


,  i>l 


and  P^(j)  is  defined  in  appendix  2. 


The  expression  of  the  processing  power  P  is  then  as  follows: 


P 

P  «  2 


IX I p-i  pl  ^  p-l 

iSii  J,  ^ 


8.  STOCHASTIC  PETRI  NET  MDDELS 
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Petri  net  models  and  derivations  thereof  [PETR66,  HDLT68,  H3LT70, 
PETR73]  have  been  introduced  by  several  authors  for  the  modeling  of  computer 
systems  [N0E71,  NL7TT72,  NDE73,  KELL76b,  PETE77,  AGER79,  SHAP79].  Although  in 
standard  Petri  nets  no  measure  of  time  is  considered  (only  a  partial  ordering 
of  the  occurrences  of  events  is  established) ,  same  of  the  models  presented  in 
the  literature  allow  a  measure  of  the  flow  of  time  by  introducing  the  concept 
of  transition  times.  Transition  times  are  assumed  to  be  deterministic,  even 
in  the  Random  Petri  net  models  introduced  by  Shapiro  [SHAP79].  Molloy 
[MCLL80]  first  introduced  the  idea  of  random  transiton  times,  by  allowing  them 
to  be  exponentially  distributed  randan  variables.  Vfe  show  in  this  section  how 
such  models  can  be  used  to  describe  the  behavior  of  multiple  bus  multiproces¬ 
sor  systems  and  to  obtain  the  Nhrkovian  models  discussed  in  the  previous  sec¬ 
tions. 

Fbr  an  introduction  to  Petri  nets  the  reader  is  referred  to  the  tutorial 
papers  by  Peterson  and  Agerwala  [PETE77,  AGER79]. 

Following  [AGER79]  wa  define  a  Petri  net  (PN)  to  be  represented  by  a 
bipartite,  directed  graph:  PN  =  (T,P,A),  vrfnere: 

T  =  {t^.tj, .. .,tnJ  is  a  set  of  transitions 

P  =  {prp2 . pm5  is  a  set  of  places 

A  £  (TxP)u(PxT)  is  a  set  of  directed  arcs  (37) 

The  set  {  T  U  P  }  forms  the  set  of  nodes  of  the  Petri  net. 
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The  dynamic  properties  of  the  EW  can  be  studied  by  analyzing  the  move¬ 
ments  of  tokens  inside  the  net.  A  FN  with  tokens  is  a  marked  Petri  net  MEW  = 
(T,P,  A,M) .  A  marking  M  of  a  FN  assigns  tokens  to  places;  M  can  be  viewed  as  a 
vector  vhose  i-th  component  represents  the  number  of  tokens  assigned  to  the 
i-th  place  p^.  A  marking  can  also  be  viewed  as  a  mapping  fron  the  set  of 
places  P  to  the  natural  numbers  I; 

M  :  P  ->  I 

M  =  U/v(/2 . {/m}  (38) 

It  is  carman  practice  to  represent  places  by  circles,  transitions  by 
bars  and  tokens  by  black  dots.  A  simple  Petri  net  is  shown  in  fig.  27. 

Ebr  a  given  transition  t  we  define  the  set  of  input  places  l(t)  as: 

I(t)  =  {  p  I  (p,t)  -4  A  }  (39) 

in  a  similar  manner  the  set  of  output  places  is  defined  as: 

O(t)  =  {  p  I  (t,p)  <  A  }  (40) 

A  transition  is  enabled  if  the  marking  M  of  the  Petri  net  is  such  that: 

M(p)  >0  all  p  <  I (t)  (41) 

Enabled  transitions  can  fire  thus  removing  one  token  frcm  each  input  place  and 
putting  one  token  in  each  output  place.  The  firing  of  a  transition  alters  the 
marking  of  the  EN  and  may  then  enable  other  transitions.  The  dynamic  behavior 
of  the  Petri  net  can  thus  be  investigated  studying  the  sequences  of  markings 
produced  by  firing  the  transitions. 

Standard  Petri  net  models  do  not  consider  time  as  a  parameter  of  the 
net;  the  firing  of  a  transition  is  assumed  to  be  instantaneous .  Modified 
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models  (see  for  instance  the  E-net  models  [NUTT72,  tJDE73])  allow  the  introduc¬ 
tion  of  fixed  transition  times.  With  stochastic  Petri  nets  the  transition 
times  are  assuned  to  be  exponential ly  distributed  randan  variables  (possibly 
with  zero  mean,  thus  accounting  for  immediate  transitions) .  More  precisely, 
the  time  that  elapses  between  the  enabling  and  the  firing  of  a  transition  is 
an  exponentially  distributed  randan  variable?  the  firing  time  is  still  assuned 
to  be  zero,  thus  in  the  case  of  two  conflicting  transitions  the  firing  of  one 
disables  the  other. 

A  continuous  time  stochastic  Petri  net  (SIN)  is  thus  an  extension  of  the 
standard  Petri  net: 


SIN  =  (P,T,A,M,6)  (42) 

where  6  is  the  set  of  the  transition  rates  associated  to  each  transition: 

6=  {6^,62*  (43) 

A  discrete  time  SEN  can  also  be  introduced,  by  considering  geometrically 
distributed  transition  times  [MOLL80]. 

Petri  nets  are  useful  in  modeling  asynchronous  concurrent  activities  in 
real  systems.  We  can  attach  a  physical  interpretation  to  markings  and  transi¬ 
tions:  a  marking  can  represent  the  state  of  the  system  and  a  transition  can 
represent  an  event  which  modifies  the  system  state.  Consider  for  example  the 
very  simple  system  of  fig.  28:  two  processors  access  an  external  common 
memory.  The  behavior  of  this  system  can  be  represented  by  the  IN  of  fig.  27, 
by  giving  the  following  interpretation  to  places  and  transitions: 
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Fig.  28  -  Two-processor  system 


processor  L  active 

p2  processor  1  accessing  canmon  menory 
P3  bus  available 
p^  processor  2  active 

processor  2  accessing  canmon  memory 
t^  processor  1  seizes  the  bus 
t2  processor  1  releases  the  bus 
t-j  processor  2  releases  the  bus 
t  processor  2  seizes  the  bus 

With  this  model  we  represent  the  possible  conflicts  in  access  requests,  but  do 
not  explicitly  model  the  queueing  of  a  processor  in  order  to  access  the  canmon 
menory.  This  feature  can  be  obtained  by  adding  two  places  and  two  transitions 
to  the  net  as  shown  in  fig.  29.  The  interpretation  of  the  added  nodes  is: 

P6  processor  1  queued 
Py  processor  2  queued 
t^  processor  1  issues  a  request 
t  processor  2  issues  a  request 

D 

The  marking  shown  in  the  figures  indicates  the  initial  state  of  the  system. 
In  order  to  obtain  the  full  definition  of  the  stochastic  Petri  net  we  must  as¬ 
sociate  a  rate  with  each  transition.  Using  the  same  notation  as  in  section  3 
we  have: 

6.  =  6.  =  oo  immediate  transition 
1  4 

62  =  63  =  u  memory  access  completion  rate 

6^  =  ^  access  rate  of  processor  1 

6^  =  ,\2  access  rate  of  processor  2  (44) 
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The  analysis  of  Petri  net  models  is  usually  based  on  the  properties  of 
the  reachability  set  associated  to  the  IN.  The  Rechability  set  of  a  IN  is  the 
set  of  all  markings  reachable  fran  the  initial  marking  M.  A  marking  M‘  is  im¬ 
mediately  reachable  frcm  M  if  it  can  be  obtained  fran  M  by  firing  sane  enabled 
transition.  A  marking  M‘  is  reachable  frcm  M  if  it  is  immediately  reachable 
fran  M  or  if  it  is  reachable  fran  any  marking  immediately  reachable  fran  M. 
The  reachability  set  of  the  SIN  of  fig.  29  is  easily  obtained,  and  it  is  shown 
in  fig.  30.  Marking  8  is  scmevihat  different  frcm  all  the  others,  as  it  is  ob¬ 
tained  fran  markings  2  and  by  firing  a  finite  rate  transition  before  an  im¬ 
mediate  transition.  Marking  9  is  therefore  reachable  with  probability  zero. 

The  number  of  tokens  in  any  place  can  be  at  most  one  for  all  markings. 
This  means  that  the  SEN  is  safe.  A  place  in  a  IN  is  said  to  be  safe,  if  it 
contains  at  most  one  token;  if  all  places  of  a  IN  are  safe,  then  the  PN  is 
safe.  We  also  note  that  all  transitions  are  such  that  for  each  markirxg  M, 
there  is  a  marking  M'  ,  reachable  frcm  M  in  vhich  the  transition  is  enabled . 
This  means  that  all  the  transitions  in  the  net  are  live,  hence  the  IN  itself 
is  live.  Liveness  is  an  important  property  as  it  guarantees  that  the  IN  is 
dead lock- free . 

IXie  to  the  memoryless  property  of  the  negative  exponential  distribution, 
the  SIN  is  isomorphic  to  a  continuous  time  Pfarkov  chain  as  shown  by  Mouldy 
[M0LL80] .  The  state  space  of  the  Pferkov  chain  can  be  obtained  frcm  the 
reachability  set  by  eliminating  those  markings  that  enable  an  immediate  tran¬ 
sition  (6^  =  oo)  .  In  the  case  of  the  SIN  of  fig.  29  we  must  eliminate  mark¬ 
ings  2  and  4  that  enable  t^  and  t^,  respectively,  and  marking  8  that  enables 
both.  We  thus  have  a  5-state  Nferkov  chain  that  can  be  represented  with  the 
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Fig.  30  -  Reachability  set  of  the  Stochastic  Petri  net  of  fig.  29 
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transition  diagram  of  fig.  31,  vhere  the  state  definition  is  as  follows: 


(s^s^  (45) 

with: 

=  state  of  processor  i 
w  =  active 

a  =  accessing  common  memory 
q  =  queued 

The  marking  that  corresponds  to  the  state  is  also  indicated  in  the  figure. 
The  transition  rates  are  those  associated  with  the  transition  that  has  to  be 
fired  in  order  to  go  fran  one  state  to  the  other.  In  the  case  of  immediate 
transitions,  we  consider  the  state  where  the  immediate  transition  is  enabled 
to  coincide  with  the  state  resulting  fran  the  firing  of  the  immediate  transi¬ 
tion. 

Cbnsider  a  2x2x2  system,  as  described  in  section  5.  We  can  represent 
the  behavior  of  such  system  using  the  SIN  of  fig.  32.  The  interpretation  of 
places  and  transitions  is  a  simple  extension  from  fig.  29.  The  transition 
rates  are: 

-  *11 
62  =  ^12 

63  =  &4  =  6g  =  610  =  °° 

65  =  6u  =  Uj_ 

66  =  ^2  = 

67  =  *\21 

68  =  ,\22  (46) 


1 


Fig.  32  -  Stochastic  Petri  net  model  of  a  2x2x2  systan. 
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The  reachability  set  is  new  shown  in  fig.  33;  23  markings  are  possible, 
4  are  reachable  with  probability  zero,  8  of  them  enable  immediate  transitions, 
hence  the  associated  Markov  chain  has  eleven  states.  The  construction  of  the 
Markov  chain  using  the  rates  associated  to  each  transition  yields  exactly  the 
chain  of  fig.  3a.  Fran  the  SHI  description  of  the  system  we  can  obtain  the 
Markov  chain  description  presented  in  the  previous  sections. 

Mote  that  the  stochastic  Petri  net  of  fig.  32  is  safe  and  live,  hence 
the  system  (as  modeled)  is  deadlock- free. 

Petri  nets  have  been  used  to  describe  and  model  the  synchronization  of 
events.  In  the  case  of  multiprocessor  systems  that  exchange  messages  through 
carman  manor ies,  processors  are  synchronized  in  the  sense  that  a  message  can 
be  read  only  after  it  has  been  written.  As  we  mentioned  in  section  3  a  pro¬ 
cessor  may  look  for  a  message  in  a  canmon  memory  and  not  find  it.  Moreover, 
the  cannon  memory  area  is  limited,  it  can  accanodate  only  a  fixed  number  of 
messages  ( assume  that  the  canmon  memory  consists  of  several  buffers  vhich  can 
accanodate  one  message  each) .  These  features  of  the  real  system  can  be  in¬ 
cluded  in  the  SHI  model  rather  easily.  Qonsider  again  the  simple  system  of 
fig.  28.  The  message  exchange  through  finite  size  memories  can  be  modeled  ex¬ 
plicitly  using  the  SHI  of  fig.  34,  vhere  the  interpretation  of  places  is  as 
follows : 
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Marking  ?1  P2  P3  p4  p5  p6  P7  P8  P9  P10  P11  P12 


1  1  0  0  0  0 

2  0  1  0  0  0 

3  0  0  10  0 

4  1  0  0  0  0 

5  1  0  0  0  0 

6  0  0  0  1  0 

7  0  10  0  0 

8  0  1  0  0  0 

9  0  0  0  0  1 

10  00100 

11  00100 

12  1  0  0  0  0 

13  1  0  0  0  0 

14  0  0  0  1  0 

15  0  0  0  1  0 

16  0  1  0  0  0 

17  0  1  0  0  0 

18  0  0  0  0  1 

19  0  0  0  0  1 

20  0  0  1  0  0 

21  0  0  10  0 

22  0  0  0  1  0 

23  0  0  0  0  1 


1110  0  0 

1110  0  0 

1110  0  0 

110  10  0 

110010 
011000 
110100 
110  0  10 

101000 
110100 
1  1  0  0  1  0 

010001 
100000 
010100 
0  10  0  10 

010001 
100000 
100100 
100010 
0  1  0  0  0  i 

100000 
000000 
000001 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

1 

0 


Pig.  33  -  Reachability  set  of  the  Stochastic  Petri  net  of  fig.  32 
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Fig.  34  -  Stochastic  Petri  net  model  of  the  tvo- processor 
system  including  synchronization  and  buffer  size. 


Pi  (ii)  processor  1(2)  active 

^2(12)  Proces9°r  K2)  queued  for  write 

processor  1(2)  queued  for  read 

p4(14)  Pcocesaor  K2)  testing  the  availability  of  buffers 

P5 (15)  processor  1(2)  testing  the  presence  of  messages 

p6(16)  Frocessor  1(2)  writing 

processor  1(2)  reading 

p8(9)  messages  for  processor  1(2) 

P1Q  bus  available 

p  buffers  in  oarmon  memory 

lo 

The  interpretation  of  the  transitions  is: 


tl(ll ) 

proc .  1(2) 

issues  a  write  request 

fc2  ( 1 2 ) 

proc.  1(2) 

issues  a  read  request 

*3(13) 

proc.  1(2) 

seizes  the  bus  for  write 

*4(14) 

proc.  1(2) 

seizes  the  bus  for  read 

Sas) 

proc.  1(2) 

found  no  message 

*6(16) 

proc.  1(2) 

found  no  buffer 

t7  (17) 

proc.  1(2) 

found  a  message 

*8(18) 

proc.  1(2) 

found  a  buffer 

*9(19) 

proc.  1(2) 

write  ends 

*10(20) 

proc.  1(2) 

read  ends 

The  iimiediate  transitions  in  the  SFN  are: 


t3't4't7't8't13't14't17't18 

To  all  other  transitions  we  can  assign  finite  rates,  according  to  the  defini¬ 
tions  of  section  3.  The  SFN  is  live,  thus  the  system  is  dead lock- free,  but. 


in  general,  it  is  not  safe,  as  places  pg,  pg  and  p^g  contain  more  than  one  to¬ 
ken  at  a  time,  unless  the  common  memory  consists  of  a  single  buffer.  The  SR) 
is  towever  k-boinded,  that  is,  for  each  marking  the  nunber  of  tokens  in  any 
place  of  the  network  is  smaller  than  k,  k  being  the  number  of  buffers  avail¬ 
able  in  the  common  memory.  The  k-bomdedness  of  the  SRI  guarantees  that  the 
reachability  set  is  finite.  Since  the  size  of  the  state  space  of  the 
equivalent  Martov  chain  is  smaller  than  or  equal  to  the  size  of  the  reachabil¬ 
ity  set,  it  too  is  finite. 

The  Petri  net  model  provides  a  formal  description  of  the  operation  of 
the  system:  from  fig.  34  and  the  interpretation  of  places  and  transitions,  we 
obtain  all  the  information  necessary  to  describe  the  way  the  system  operates. 

The  SHI  is  in  this  case  much  more  complex  than  in  fig .  29,  where  ye  did 
not  explicitly  model  the  synchronization  betveen  transmitting  and  receiving 
processor,  75  markings  are  reachable  in  the  single  buffer  case.  Nevertheless, 
from  fig.  34,  we  can  obtain  a  Martov  chain  that  models  the  behavior  of  the 
system  including  those  features,  using  the  same  rules  as  before.  The  complex¬ 
ity  of  the  result  limits  the  applicability  of  these  highly  detailed  models  to 
very  small  systems. 
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9.  RESULTS 


Exact  and  approximate  analytic  results  were  ccmpared  by  considering  a 
4x3x2,  a  4x4x3  and  a  6x4x2  system  respectively.  Hie  exact  chains  for  the 
first  two  systems  are  shown  in  fig.  6  and  8,  respectively.  The  exact  chain 
for  the  third  system  (not  shown  here)  has  37  states. 

The  results  for  the  4x3x2  system  are  presented  in  fig.  35.  The  first 
column  gives  the  value  of  p  =  ,  the  second  column  shows  the  exact  value  of 
processing  power  as  a  function  of  p  ,  evaluated  using  the  exact  lumped  chain. 
The  other  columns  show  the  percentage  error  which  affects  the  processing  power 
value  computed  with  each  of  the  four  approximations  introduced  in  this  report. 
Fbr  this  case,  the  exact  chain  has  12  states,  approximations  A1  and  A2  have  8, 
approximation  B2  has  10  and  approximation  C2  has  5  states. 

Hie  results  fbr  the  4x4x3  system  are  shown  in  fig.  36,  using  the  same 
format.  Hie  exact  chain  has  again  12  states,  whereas  the  approximate  chains 
have  10,  10,  11  and  5  states,  respectively. 

In  fig.  37  the  results  fbr  the  6x4x2  system  are  presented.  Hie  number 
of  states  are  in  this  case  37,  12,  12,  16  and  7. 

A  number  of  observations  can  be  made  based  on  these  results.  Firstly 
approximations  Al,  A2  and  B2  seem  to  yielu  upper  bounds  on  the  processing 
power,  whereas  C2  gives  a  lcwer  bound.  Hie  upper  bound  can  be  intuitively  ex¬ 
plained  for  approximation  Al,  since  the  random  redistributing  of  processors  to 
memories  tends  to  relieve  memory  congestion  and  therefore  improve  performance. 
Hie  bound?  seem  to  be  rather  tight,  since  percentage  errors  well  below  10% 
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p 

exact 

A1 

A2 

B2 

C2 

.1000e-02 

.3996e+01 

.00 

.00 

.00 

.00 

.  1000e-01 
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.5000eK)0 
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2.95 
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.5806ef00 

3.82 
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1.27 

-1.65 
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4.07 

4.02 
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. 1000e+02 
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1.70 

-1.45 

. 1000ef04 
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-1.43 

Fig.  35 

-  Results 

for  the  4x3x2 

system. 
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-2.09 

.  1000eK)2 

•2203e+00 
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8.78 
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.1000et03 
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3.01 
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.1000eK>4 
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3.05 

-2.83 

Fig.  36  -  Results  for  the  4x4x3  system. 
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Fig.  37  -  Results  for  the  6x4x2  system. 
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were  tipically  observed  (except  for  approximation  A2  in  the  4x4x3  case). 
Tight  upper  and  lcwer  bounds  are  extremely  useful,  as  they  allow  to  determine 
a  snail  range  in  which  the  exact  result  must  lie,  avoiding  the  computational 
complexity  of  the  exact  problem. 

Secondly,  we  observe  that  the  largest  system  (6x4x2)  shows  the  smallest 
percentage  errors.  This  may  be  due  to  the  fact  that  the  rate  averaging  ap¬ 
proximation  gives  better  results  for  higher  number  of  states.  If  the  trend  of 
smaller  errors  with  larger  systems  were  verified  for  even  larger  models,  then 
we  could  conclude  that  our  approximate  models  are  more  than  adequate  for  the 
study  of  large  multibus  systems. 

In  order  to  study  the  influence  of  the  simplifying  assumptions  intro¬ 
duced,  and  to  test  the  performance  of  the  approximate  techniques  on  larger 
systems,  a  simulation  program  was  written  in  GPSS.  Due  to  the  peculiarities 
of  the  language,  seme  discrepancies  are  expected  between  the  simulated  systems 
and  the  models  for  which  we  performed  a  fferkov  chain  analysis.  Nevertheless  a 
comparison  between  the  analytic  and  the  simulation  results  shows  a  very  good 
agreement.  As  an  example,  in  fig.  38  results  are  shown  for  the  2x2x2  system. 

The  influence  of  the  simplifying  assumptions  was  studied  taking  the 
6x4x2  system  as  a  benchmark.  Exact  and  approximate  analytic  results  for  this 
system  were  shown  in  fig.  37.  First,  the  impact  of  memory  access  time  distri¬ 
bution  on  system  performance  is  tested.  Fig.  39  shows  the  value  of  the  pro¬ 
cessing  power  -  obtained  via  simulation  -  for  a  6x4x2  system  with  fixed  memory 
access  time.  The  fixed  access  time  results  are  smaller  than  the  exponential 
access  time  results  in  fig.  37,  as  is  expected  from  known  results  in  queueing 
theory. 
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Fig.  38  -  Comparison  of  analytic  and  simulation  results 
for  a  2x2x2  system. 
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Next,  the  uniform  memory  reference  assumption  is  relaxed  by  assuming 
that  access  requests  frcm  any  processor  are  directed  to  memory  1  with  proba¬ 
bility  d,  and  are  uniformly  distributed  among  all  other  memories.  That  is,  we 
set: 


il 


=  d 


all  i 


P.  . 
il 


i  -  j 
m  -  1 


all  i,  j=2, . . .  ,m 


The  results  are  reported  in  Fig.  40.  We  can  see  that  the  value  d  =  l/m  is  the 
one  that  maximizes  the  processing  power.  This  result  was  expected,  since  high 
values  of  d  imply  that  one  memory  is  the  bottleneck  of  the  system,  whereas  low 
values  of  d  mean  that  the  accesses  are  mainly  directed  to  three  memories . 
Both  situations  increase  memory  contention  and  thus  decrease  system 
throughput . 


The  increase  in  efficiency  gained  by  varying  the  the  number  of  busses 
was  also  analyzed.  Fig .  41  shows  simulation  results  for  a  6-processor,  4- 
mamary  system  using  a  number  of  busses  varying  from  1  to  4.  The  increase  in 
processing  powar  is  negligible  for  low  values  of  p,  but  becomes  very  signifi¬ 
cant  for  heavily  loaded  systems.  In  the  latter  case  the  increase  in  perfor¬ 
mance  clearly  shows  a  "diminishing  return"  behaviour. 

Finally,  a  16-processor,  8-mamory,  3-bus  system  was  simulated,  in  order 
to  test  the  accuracy  of  the  approximate  models  for  large  system  size.  Results 
are  shown  in  fig.  42.  The  approximate  Iterkov  chains  of  models  A1  and  C2,  hav¬ 
ing  46  and  17  states  respectively,  were  solved.  The  resuilts  show  that  the  ap¬ 
proximate  models  behave  very  well  for  a  system  of  this  size;  indeed,  the  ap¬ 
proximate  results  are  so  close  to  the  simulation  results  that  they  fall  within 
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the  simulation  confidence  interval.  Moreover ,  since  the  system  of  linear 
equations  associated  with  the  approximate  Nfertov  chain  can  be  easily  solved 
with  numerical  methods,  the  approximate  models  require  much  less  canpiuter  time 
than  a  simulation  program. 
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APPENDIX  1 


In  this  appendix  we  give  expressions  for  the  transition  rates  of  the  ap¬ 
proximate  ffertov  chain  of  model  A1  in  the  general  case  of  a  pxmxb  system. 

Consider  that,  given  that  we  are  in  state  (i,j),  transitions  can  occur 
to  at  most  four  neighboring  states: 

( i+1 , j )  ( i-1 , j )  (i, j+1)  (i, j-1 )  (Al.l) 

and  we  denote  such  transitions,  respectively,  with  the  notation 


i — >i+l  i — >i-l  j — >j+l  j — >j-l  (Al.2) 

Using  the  simplifications  introduced  we  associate  to  these  transitions  the 
following  rates: 


R(i  ->  i+1)  =  (p-i- j)  ,\  ~ 


0£i<b 
p-i-  j>0 


(A1.3) 


R(i  ->  i-1)  = 


i  h 


b  n 


lb-1 1  j 
I  m  I 


i<b 


i=b 


(A1.4) 


R(  j  ->  j+1)  = 


(I^i-j)  |  ,\ 
(p-b-j)  #\ 


i<b,  p-i-j>0 
i=b,  p-b- j>0 


(A1.5) 


-79- 


APPENDIX  2 


In  this  appendix  we  give  expressions  for  the  nunber  of  states  at  level  1 
of  the  exact  lumped  chain  in  the  case  of  a  pxmx2  system. 

Ws  want  to  count  the  number  of  states  tliat  show  some  properties  in  order 
to  evaluate  the  transition  rates  of  the  approximate  Ntorkov  models  using  the 
averaging  technique  introduced  in  section  7. 

The  level  of  a  state  is  defined  as  the  difference  between  the  total 
number  of  processors  and  the  number  of  active  processors. 

At  levels  0  and  1  there  is  only  one  state,  at  level  2  there  are  two 
states,  one  with  one  processor  accessing  cannon  memory  and  one  with  two. 

Fbr  1>3  we  know  that  we  have  one  state  with  n  =1  (see  eg.  15),  but  we  do 
not  know  how  many  states  exist  with  nm=2.  The  number  of  such  states  can  be 
evaluated  by  applying  acme  results  in  combinatorial  analysis. 

Define  the  numbers  p^n)  toy  the  recurrent  relation 

p^(n)  =  p^n-k)  +  pj^fn-k)  +  ...  +  p^n-k)  +  pQ(n-k)  (A2.1) 

with 

p^n)  =  0  n<k  ,  k<0 

pQ(n)  0  n>0  (A2.2) 

p^k)  =  1  k>0 

Note  that  p^(n)  is  the  number  of  unordered  partitions  of  n  into  k  parts,  with 

k  and  n  integers. 
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Now  we  can  state  that  the  number  of  states  at  level  l=K+2,  1>3,  such 

that  n^=2  in  a  pxmx2  systen  is: 


n(m,2,k)  =  I  Jp,(jt-2)  p__0 (k- j+m-2 )  J  ,  k£p-2 

>0  '  z  z  I 


(A2.3) 


Out  of  this  number,  seme  states  will  be  such  that  no  processor  is  queueing  for 
a  bus  to  reach  a  free  memory.  The  number  of  these  states  is: 


nQ(m,2,k)  =  p2(k+2)  ,  k<p-2 


(A2.4) 


On  the  contrary  the  number  of  states  such  that  some  processor  is  queueing  for 
a  bus  is: 


(m, 2,k)  =  2  P2(?f2)  p^C*- »  k<p-2 


(A2.5) 


Finally,  the  number  of  states  at  level  l=k+2 ,  1>3  with  some  processor  queueing 
for  a  bus  (if  more  than  one  then  all  processors  queueing  for  the  same  marvory 
module)  and  at  least  one  queue  for  the  busy  memories  empty,  is: 


n.  (m, 2,k)  =  I  p,  (}*1)  =  k 

j=0 


k<p-2 


(A2.6) 


In  the  particular  case  of  three  memories  (itp3)  the  above  resul ts  can  be 
put  in  polynomial  form: 


n(3, 2,k)  = 


+  k  +  |  K  odd 


-T-  +  k  +  1  k  even 
4 


(A2.7) 


k+1 

2 


K  odd 


nQ(3,2,k) 


k  even 


^(3,2^) 


k2  ,  k  .  1  . 

T  7  *  kodd 

k2  k 

k  even 

4  2 


i 

i 


i 


i 

t 

i 


|  -83- 


L 


(A2.8) 


(A2.9) 
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